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for the Gaussian Channel with Two-sided 
Input-Noise Dependent State Information 
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Abstract 

In this paper, a new and general version of Gaussian channel in presence of two-sided state information correlated 
to the channel input and noise is considered. Determining a general achievable rate for the channel and obtaining the 
capacity in a non-limiting case, we try to analyze and solve the Gaussian version of the Cover-Chiang theorem -as 
an open problem- mathematically and information-theoretically. Our capacity theorem, while including all previous 
theorems as its special cases, explains situations that can not be analyzed by them; for example, the effect of the 
correlation between the side information and the channel input on the capacity of the channel that can not be analyzed 
with Costa’s “writing on dirty paper” theorem. Meanwhile, we try to introduce our new idea, i.e., describing the 
concept of “cognition” of a communicating object (transmitter, receiver, relay and so on) on some variable (channel 
noise, interference and so on) with the information-theoretic concept of “side information” correlated to that variable 
and known by the object. According to our theorem, the channel capacity is an increasing function of the mutual 
information of the side information and the channel noise. Therefore our channel and its capacity theorem exemplify 
the “cognition” of the transmitter and receiver on the channel noise based on the new description. Our capacity 
theorem has interesting interpretations originated from this new idea. 

Index Terms 

Gaussian channel capacity, correlated side information, two sided state information, transmitter cognition, receiver 
cognition. 


I. Introduction 

Side information channel has been actively studied since its initiation by Shannon Q. Coding for computer 
memories with defective cells was studied by Kusnetsov-Tsybakov Q. Gel’fand-Pinsker (GP) |[^ determined the 
capacity of channels with channel side information (CSI) known non-causally at the transmitter. Heegard-El Gamal 
g) obtained the capacity when the CSI is known only at the receiver. Cover-Chiang Q extended these results to a 
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Fig. 1. Gaussian channel with additive interference known non-causally at the transmitter. 


general case where correlated two-sided state information are available at the transmitter and at the receiver. Costa 
1^ obtained an interesting result by carefully investigating the GP theorem for the Gaussian channel, i.e., he proved 
that the capacity of the Gaussian channel with an interference known at the transmitter is the same as the capacity 
of interference free channels. There are many other important researches in the literature, e.g. 0-0. The results 
for the single user channel have been generalized possibly to multi user channels, at least in special cases pO)- m- 


Our Motivations 

In this paper, we focus on the Gaussian channel in presence of side information for two major aims: First, 
analyzing the problem of capacity of the Gaussian channel in presence of two sided state information -the Gaussian 
version of Cover-Chiang theorem 0, mathematically and information-theoretically. Second we try to present an 
information-theoretical description of the concept of “cognition” of the transmitter and or receiver in an improved 
manner. 

First motivation: In this paper, we try to analyze the Gaussian version of the Cover-Chaing unifying theorem 
0. The problem of the effect of side information at the transmitter in a Gaussian channel, in a special case, first, 
has been studied in Costa’s ’’writing on dirty paper” 0. Let us consider a Gaussian channel with side information 
known non-causally at the transmitter as depicted in Fig. [T] We denote the side information at the transmitter, the 
channel input, the channel output, the channel noise and the auxiliary random variable at the transmitter by Si, X, 
Y, Z and U, respectively. Moreover, it is assumed that and Z are Gaussian random variables with powers Qi 
and N respectively and X has the power constraint E < P. 

Costa @ shows that the capacity of this channel is surprisingly the same as the capacity of the channel without 
side information. An important assumption in Costa theorem is that in the definition of the channel, there is no 
restriction for the correlation between X and ^i. However, Costa shows that the maximum rate is obtained when 
X and Si are independent and C7 is a linear function of X and Si. Hence, his theorem is only applicable to 
cases where X and have the chance to be uncorrelated. Therefore a theorem which can handle the capacity 
of Gaussian channels when there exists a specific correlation between X and Si is theoretically and practically 
important. One example for correlated input and side information is cognitive interference channels in which the 
transmitted sequence of one transmitter is a known interference for the other transmitter and these two sequences 
may be dependent to each other. Another example is a measurement system where the measuring signal may 
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Fig. 2. Gaussian channel with correlated side information known non-causally at the transmitter and at the receiver. 


affect the system under measurement. This is equivalent to an interfering signal which is dependent on the original 
measuring signal. 

Another related question is about the side information S 2 known non-causally at the receiver (if exists as in Fig. 
1^. The question now arises is that; How does the receiver knowledge S 2 , correlated to {X, Si) affect the channel 
capacity? And how much does the receiver information about X and Si, available through S 2 , change the channel 
capacity? 

Some communication scenarios in which the channel input and the side information may be correlated and 
the related investigations can be found in |j9| and 1T6|. In |j9| the problem of optimum transmission rate under 
the requirement of minimum mutual information I (S'";!"") is investigated. Moreover both ||^ and |16| study 
Costa’s “writing on dirty paper” problem where the side information is correlated to the input of the channel (our 
motivation), when only side information known at the transmitter exists. We, in another work, have considered and 
solved the problem of the capacity of Gaussian channel with two-sided state information in a limited case GZ)- 

Moreover, examining the Gaussian channel with two-sided state information with dependency on the channel 
noise and channel input, we try to solve the Gaussian version of Cover-Chiang theorem Q as an open problem. 


Second motivation: One of the most known and important applications of the channels with side information is 
information theoretically describing the concept of “cognition” of the transmitter in communication scenarios. Side 
information in this description, for example, may be the interference which transmitter exactly knows all about it. 
Two questions arise about this description: 

1) It is usually expected the knowledge about or cognition on something to be “quantitative”. For example the 
cognition that the transmitter can acquire about the interference may be incomplete or partial. So one question is: 
How can we describe the “quantity” or “amount” of the transmitter cognition? The investigations of the channels 
with partial CSI try to answer this question, for example |T8)-@. 

2) It is possible in a communication scenario that the transmitter has knowledge about more than one variable 
in the channel. For example in a cognitive interference channel the transmitter may have knowledge about the 
interference originated by the other transmitter and at the same time about the channel noise. Hence, the other 
question is: How can we describe the “cognition that the transmitter has on some variables”. 

In this paper, we propose describing the concept of the “transmitter and or receiver cognition on some variables” 
by side information available at the transmitter and or receiver probabilistically dependent on those variables. Hence, 
the side information known at the transmitter correlated to the variable A, describes the transmitter cognition on 
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A and the amount of this cognition increases as the correlation between the side information and the variable A 
increases. Distinguishing between this meaning of “cognition” from the usual meaning widely used in the literature, 
it may be proper to use the word “re-cognition (of the transmitter or receiver on something)” for it. 

Hence in a Gaussian channel in presence of two-sided state information depicted in Fig. Si which is the side 
information known at the transmitter can be interpreted as the transmitter re-cognition on the channel noise, if Si 
is correlated with Z. It is seen that our first motivation, not only can be seen as an effort to solve an important 
open problem, but also ,if solved, it can exemplify this new description. 


Our Work 

To provide the above motivations, we define a Gaussian channel in presence of two-sided state information 
where the channel input X, side information {Si, S 2 ) and the channel noise Z are arbitrarily correlated. Using the 
extended version of Cover-Chiang unifying theorem Q to continuous alphabets, we prove a general achievable rate 
for the channel (lemma 1). Then, we obtain a general upper bound for the channel in the case that the channel input 
X, the side information {Si,S 2 ) and the channel noise Z, form the Markov chain X {Si,S 2 ) Z (lemma 
2) and we show the coincidence of the lower and upper bounds under this circumstance and therefore establish 
our capacity theorem for the channel. Using our probabilistic description of “re-cognition” of the transmitter, this 
circumstance can be explained as follows; if the whole “re-cognition” that the transmitter has got on the channel 
noise, is gained from the side information {Si, S 2 ) -that is a meaningful and practically acceptable circumstance in 
our communication scenario- then the Markov chain X —> {Si,S 2 ) Z must be satisfied. The obtained channel 
capacity can be expressed as an increasing function of the mutual information between the side information (S*!, S 2 ) 
and the channel noise Z (i.e. / (5'iiS'2; Z )) and this shows that our new description of “re-cognition” of the transmitter 
and the receiver can be exemplified by our channel and its capacity. 

Paper Organization 

This paper is organized as follows: in section II, we briefly review the Cover-Chiang and the Gel’fand-Pinsker 
theorems and then introduce a scrutiny of the Costa theorem. In section III, we define our Gaussian channel 
thoroughly and prove a general lower bound for the defined channel and then obtain a general upper bound for 
the channel in mentioned case, which coincides with the lower bound and hence is the capacity of the channel. In 
Section IV, we examine the proved capacity in special cases and interpret them. Specifically, we explain that how 
this capacity theorem can exemplify the new description of the “re-cognition” of transmitter and or receiver on 
something. Section VI contains the conclusion. The proofs of lower and upper bounds of the capacity of channel 
and two lemmas used in our proofs are given in the Appendix. 

II. A Review of Previous Related Works 

To clarify our approach in subsequent sections, in this section we first briefly review the Cover-Chiang capacity 
theorem for channels with side information available at the transmitter and at the receiver. We then review the 
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Fig. 3. Channel with side information available non-causally at the transmitter and at the receiver. 


Gel’fand-Pinsker (GP) theorem which is a special case of Cover-Chiang theorem when side information is known 
only at the transmitter. Finally Costa theorem (“writing on dirty paper” theorem), which is the Gaussian version of 
the GP theorem, is deeply investigated. 

A. Cover-Chiang Theorem 

Fig.|3] shows a channel with side information known at the transmitter and at the receiver where X” and V" 
are the transmitted and the received sequences respectively. The sequences S'" and SJ are the side information 
known non-causally at the transmitter and at the receiver respectively. The transition probability of the channel, 
p{y I X, Si,S2). depends on the input X, the side information Si and S 2 . It can be shown that if the channel is 
memoryless and the sequences (S",S 2 ) is independent and identically distributed (i.i.d.) random variables under 
p{si,S 2 ), then the capacity of the channel is ||^: 

C= max [/(C/;S 2 ,V)-/(C/;Si)] (1) 

p{u,x\si) 

where the maximum is over all distributions: 

p{y,x,u,Si,S2) =piy\ X,Si,S2)p{u,X I Sl)p(si,S2) (2) 

and U is an auxiliary random variable. 

It is important to note that the Markov chains: 

S 2 ^ Si ^ UX (3) 

U XS 1 S 2 Y (4) 

are satisfied for all distributions in 

B. Gel’fand-Pinsker (GP) Theorem 

This theorem is special case of Cover-Chiang theorem when S 2 = </>. According to GP theorem 0: 

A memoryless channel with transition probability p(jj \ x, si) and side information sequence Sf i.i.d. with p (si) 
known non-causally at the transmitter depicted in Fig. has the capacity 


C= max [I{U]Y)- I{U\Si)] 

p(u,x\si) 


( 5 ) 
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Fig. 4. Channel with side information known at the transmitter. 


for all distributions: 

p{y,x,u,si) =p{y I x,si)p{u,x \ si)p{si) (6) 

where U is an auxiliary random variable. 

C. Costa’s “Writing on Dirty Paper” 

Costa Q examined the Gaussian version of the channel with side information known at the transmitter (Fig. [^. 
As can be seen, the side information is considered as an additive interference at the receiver. Costa showed that 
the channel, surprisingly, has the capacity ^ log (l + ^), which is the the same for channels with no interference 
Si. Costa derived this capacity by using the results of Gelfand-Pinsker theorem extended to random variables with 
continuous alphabets. In this subsection, we first introduce the Costa assumptions and then present a proof for this 
theorem in such a way that it enables us to introduce our channel and develop our theorem in subsequent sections. 

The channel is specified with properties C.1-C.3 below: 

C.l: Si is a sequence of Gaussian i.i.d. random variables with distribution ~ Af {0, Qi). 

C.2: The transmitted sequence X" is assumed to have the power constraint E {X^} < P. 

C.3: The output is given by X" = X" + S'” + Z”, where Z” is the sequence of white Gaussian noise with 

zero mean and power N i.e. Z ~ A/" (0, N) and independent of (X, Si). The sequence Sf is non-causally known 
at the transmitter. 

It is readily seen that the distributions p {y, x, u, Si) having the above three properties are in the form of (j^. We 
denote the set of all these p {y, x, u, si)’s with Vc- Although for the Costa channel described above, no restriction 
has been imposed on the correlation between X and Si, in Costa theorem, the maximum rate corresponds to 
independent X and Si, and U in form of linear combination of X and Si. We define Vq as a subset of Vc with 
elements p' {y,x,u,Si) having the following properties as well as properties C.l-C.3 mentioned before: 

C.4: X is a zero mean Gaussian random variable with the maximum average power P and independent of 

Si. 

C.5: The auxiliary random variable U takes the linear form U = a Si -\- X. 
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It is clear that the set Vq (described in C.1-C.5) and their marginal and conditional distributions are subsets of 
corresponding (described in C.1-C.3). 


Achievable rate for Costa channel: From 0, when extended to memoryless channels with discrete time and 
continuous alphabets, we can obtain an achievable rate for the channel. 

The capacity of Costa channel can be written as: 


Ccosta= max [I {U;Y) - I {U;Si)] 

p{u,x\si) 

where the maximum is over all p {y, x, u, Si)’s in Vc- Since V'(j C Vc we have: 

Ccosta> max [I {U;Y) - I {U;Si)] 

p'{u,x\si) 

= max [I{U-Y)-I{U-,S^)] 

p'{u\x,si)p'{x\si) 

= max[J((7;y)-/([/;S'i)] 


(7) 

( 8 ) 
(9) 

( 10 ) 


The expression in the last bracket is calculated for distributions p' {y,x,u,si) in V'(y described in C.1-C.5. Thus, 
dehning R (a) = I {U \Y) — I {U \ Si), maxo, R (a) is an achievable rate for the channel. R (a) and maxo, R (a) 
is calculated as: 

P{P + QiYN) \ 


and 


where 


i? (a) = ^ log ( — „ 

2 \pQ^)i-a)^ + N{P + a-^Qi) 


1 


maxi?(a) = R{a*) = - log 1 + — 


P 


N 


P 


( 11 ) 


( 12 ) 


(13) 


P + N 

Both R{a*) and a* are independent of Qi and then of ^i. 

Converse part of Costa theorem: From Q we can also obtain an upper bound for the channel capacity. We have: 


I{U-Y)-I{U-,Si) = -H{U\Y) + H{U\Si) 

<-H{U\Y,Si) + H{U\Si) 
= I{U-,Y\Si) 

<nx-Y\Si) 


(14) 

(15) 

(16) 
(17) 


where inequality ( [fS] ) follows from the fact that conditioning reduces the entropy and ( [T7| ) follows from Markov 
chain U —>■ XSi —>■ Y which is correct for all distributions p (y, x,u, si) in the form of 0, including the 
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distributions in the set Vc- Hence we can write; 


Ccosta= max [I {U;Y) - I {U;Si)] 

p(u,x\si) 

< max [/ {X;Y \ 5i)] 

p(a;|si) 

= max [H {Y \ Si) - H (Y \ X, ^i)] 

p(a;|si) 

= max [H {X + Z \ Si) - H {Z \ X, Si) 

p{x\si) 

< max [H{X + Z)-H {Z)] 

p{x\s-i) 


1 


= i^^og[l + - 


P 


( 18 ) 

(19) 

( 20 ) 
( 21 ) 
( 22 ) 

(23) 


where the inequality ( [22) l is due to the fact that conditioning reduces the entropy. The maximum in ( [22| l is obtained 
when X and Z are jointly Gaussian with E {X^^ = P because when the variance is limited, Gaussian distribution 

it is seen that the lower and the upper bounds of the capacity coincide, 


maximizes the entropy. From (12i and 
and therefore the channel capacity is equal to 2 log (l + ^). It is also concluded that for the channel described in 
C.1-C.3, the optimum condition which leads to the capacity is when X ~ A2(0, P) and independent of 5'i. □ 

We can explain the Costa theorem more, as follows: Let consider Y = X -\- Si + Si + Z with independent 
Gaussian interference with power Qi, Si with power and Z with power N. If the transmitter knows nothing 


about this interference, then we take U = X and C = ^ log 
we take U = X + aSi and we have G = i log (1 


1 + 


N+Q) 


-—r I. If S'] is known at the transmitter, then 

N+Qi+QiJ ^ 

and if Si and Si are both known at the transmitter. 


then U = X + aSi + j3Si and C = \ log (l + ;^). 


III. Capacity Theorem For The Gaussian Channel with Two-sided Input-Noise Dependent Side 

Information 

In this section we introduce a Gaussian channel in the presence of two-sided state information correlated to the 
channel input and noise. Then we present our capacity theorem for this Gaussian channel. The theorem obtains 
the capacity of channel in the case the channel input X, the side information {Si,S 2 ) and the channel noise Z, 
form the Markov chain X (iS'i,S' 2 ) —>^ Z. With our new description of the “re-cognition” of the transmitter 
on the channel noise, the probabilistic dependency between the side information (S'i,S' 2 ) and the channel noise 
Z, determines the cognition on the channel noise that the side information carries to the transmitter. Therefore, 
this Markov chain states that the transmitter acquires all its knowledge on the channel noise just from the side 
information (5'i,S'2), which is practically meaningful and acceptable in our scenario. To prove the theorem, we 
obtain a general achievable rate for the channel capacity (lemma 1) and then a general upper bound for the channel 
capacity in mentioned case (lemma 2) and show the coincidence of these lower and upper bounds. 
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Vc 



Fig. 5. Partitioning Vc into ’s. p*{y,x,u,s\) is the optimum distribution for the Costa channel. 


A. Definition of the Channel 

As mentioned before, in a Gaussian channel with side information known at the transmitter defined by the set Vc 
with properties C.1-C.3 (Costa channel), no restriction is imposed upon the correlation between the channel input 
X and the side information ^i. As mentioned in section I, the capacity ^ log (l + ;^) is only valid for channels 
in which X and Si has the chance to be independent. Specifically the maximum rate is achieved when X and 
Si are independent. Let Vc is partitioned into subsets 'Pp^si including the distributions p{y,x,u,Si) for which 
the correlation coefficient between X and is equal to pxSi ns depicted in Fig. It is obvious that V'c (the 
set of distributions with properties C.1-C.5) is a subset of Vpxsi=o nnd therefore the optimum distribution leading 
to the capacity of the Costa channel does not belong to other partitions. We can therefore claim that the Costa 
theorem is not valid for channels defined with random variables {Y, X,U, Si) ~ p{y,x,u,Si) in partition Vp^g_^ 
with pxsi ^ 0. 


Consider the Gaussian channel depicted in Fig. The side information at the transmitter and at the receiver 
S- 2 . is considered as additive interference at the receiver. From the above discussion, providing our mentioned 
motivations in section I, our channel has three differences with Costa’s one as follows: 

1) In our channel, a specified correlation coefficient pxSx between X and ^i, exists. 

2) To investigate the effect of the side information known at the receiver, we suppose that in our channel there 
exists a Gaussian side information S 2 known non-causally at the receiver which is correlated to both X and ^i. 

3) We allow the channel input X and the side information Si and S 2 to be correlated to the channel noise Z. 

Remark: It is important to note that, as we prove in lemma 3 in the Appendix C, assuming the input random 
variable X correlated to Si and S 2 with specified correlation coefficients, does not impose any restriction on 2f’s 
own distribution and the distribution of X is still free to choose. 
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Considering the above differences, our channel is defined by the following properties GC.1-GC.4 (GC for General 
version of Costa) below: 

GC.l: {Si, S 2 ) are i.i.d. sequences with zero mean and jointly Gaussian distributions with power cr|^ = Qi 

and cr|^ = Q 2 respectively (so we have Si ^ Af (0, Qi) and S 2 ^ (0, < 32 ))- 

GC.2: The output sequence = X" + 5'" + S' 2 where Z" is the sequence of white Gaussian noise with 
zero mean and power N {Z ^ M {0,N)Y The sequences S'" and SJ are non-causally known at the transmitter 
and at the receiver respectively. 

GC.3: Random variables {X, Si, S 2 , Z) have the covariance matrix K: 


X2 

XSi 

XS 2 

xz 

XSi 

s? 

S 1 S 2 SiZ 

XS 2 

^1^2 

SI 

S 2 Z 

xz 

SiZ 

S 2 Z 

z^ 


(24) 






CjfO'SiPXSi <^X<^S2PXS2 <^x<^zPxz 
^2 


a 


Si 


<^Si<^S2PSiS2 <^Si<^ZPSiZ 


crxcrzpxz <ySi<^zPSiZ 


<7 


S2 

CTSa (^ZPS 2 Z 


CrS2^ZPS2Z 


(25) 


and therefore, in our channel, the Gaussian noise Z is not necessarily independent of the additive interference Si 
and S 2 and the input X. Moreover X" is assumed to have the constraint cr^ < P. Except ax, all other parameters 
in K have hxed values specihed for the channel and must be considered as the definition of the channel. 

GC.4: {X, U, Si, S 2 ) form the Markov Chain S 2 ^ Si ^ UX. As mentioned earlier, this Markov chain is 

satished by all distributions p {y, x, u, si, S 2 ) in the form of (j^ in Cover-Chiang capacity theorem and is physically 
reasonable. Since this Markov chain results in the weaker Markov chain S 2 —Si —> X, as proved in lemma 4 in 


the Appendix D, this property implies that in the covariance matrix iC in (25 1 we have: 


PXS2 = PXS1PS1S2 


(26) 


It is readily seen that all distributions p{y,x,u, 81 , 82 ) having the properties GC.1-GC.4 are in the form of 
(0. Therefore we can apply the extended version of Cover-Chiang theorem for random variables with continuous 
alphabets to our channel. We denote the set of all these distributions p{y,x,u, 8 i, 82 ) with ’Ppxsi 


Remark: In the absence of S 2 and when Z is independent of (26, Si), we can compare the capacity of our 
channel with the Costa channel and write: 

Ccosta = max Cl. (27) 

82 = 0 ,pxSi 


where Ci denotes the capacity of our channel when Z is independent of (X, Si,S 2 ). Note that in this case and 
when S 2 = 0, we have Vc = Upxs '^Pxsi therefore, looking for the maximum rate in Vc leads to the 
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maximum rate among ’s. 

We will show that the optimum distribution resulting in maximum transmission rate, is obtained when {X, Si, S 2 ) 
are jointly Gaussian and the auxiliary random variable U is a linear combination of X and S'!. We denote the set 
of distributions p* {y,x,u, 81 , 82 ) having properties GC.5 and GC.6 below as well as properties GC.1-GC.4, with 

T>» 

' PxSi • 

GC.5: The random variables (X, 81 , 82 ) are jointly Gaussian distributed and X has zero mean and the 

maximum power P i.e. X ~ Af {Q,P). 


GC. 6 : As in the Costa theorem: 

U = aSi+ X. (28) 

where X and are now correlated. 

It is clear that the set Vp^g (described in GC.1-GC.6) and their marginal and conditional distributions are subsets 
of corresponding Vpxs^’^ (described in GC.1-GC.4). 


As the final part of this subsection we introduce some definitions required for our capacity theorem: 

Suppose K is the covariance matrix for random variables {X, Si, 82 , Z) having all properties GC.1-GC.6; 
defining: 


Ai=E {XS^} = axcrs,PxSi )* = 1)2 
Lo=E {X Z} = ax<^zPxz 

L^=E {SiZ} = aSiCrzps,z ,* = 1,2 


B=E {S 1 S 2 } = crSiCrS 2 PSiS 2 


we can write K, its determinant D and its minors as: 

P Ai A2 Lq 
^ Ai Qi B Li 
A 2 B Q 2 L 2 
Lq Li L2 N 

P Ai A2 Lq 
Ai Qi B Li 
A 2 B Q 2 L 2 
Lq Li L2 N 


(29) 

(30) 

(31) 

(32) 


(33) 


( 34 ) 
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Qi 

B 

Li 


P 

A 2 

^0 


Ai 

B 

Li 

dp= 

B 

Q 2 

L 2 

, dQ = 

A 2 

Q 2 

^2 

, dA = 

A 2 

Q 2 

L 2 


Li 

L 2 

N 


Lo 

L 2 

N 


Lo 

L 2 

N 


Ai 

Qi 

B 


P 

Ai 

A 2 


P 

Ai 

A 2 

dLo = 

A 2 

B 

Q 2 

, dpA 

A 2 

B 

Q 2 

, dM= 

Ai 

Qi 

B 


Lq 

Li 

L 2 


Lo 

Li 

L 2 


A 2 

B 

Q 2 

dQiN = 

p 

A 2 


, dQ2N = 

P 

Ai 


, dpN= 

Qi 

B 


A 2 

Q 2 


Ai 

Qi 



B 

Q 2 


dpQi = 

Q 2 

L 2 


,dLoLi = 

Ai 

A 2 


, dpLi— 

B 

Q 2 


L 2 

N 



B 

Q 2 



Li 

L 2 



} (35) 


d 


_A 

QiLq 


A 2 

Lo 


Q 2 

L 2 


B. The Capacity of the Channel 

Theorem: The Gaussian channel defined by properties GC.1-GC.4, when the channel input X, the side information 
{Si, S 2 ) and the channel noise Z form the Markov chain X {Si, S 2 ) —> Z, has the capacity: 


where 


C=-log 


1 + 




N 


dP 


d’^ 


PS1S2 PSiZ 
PS1S2 1 PS2Z 
PSiZ PS2Z 1 

1 + ‘^PSiS2PSiZPS2Z - PS1S2 ~ PSiZ - P%z- 


(36) 


(37) 


Proof of Theorem: To prove the theorem, first, we prove a general achievable rate for the channel in lemma 1. 
Then in lemma 2, we obtain an upper bound for the channel in the case the transmitter acquires all its knowledge 
on the channel noise Z from the side information {Si,S2), i.e, we have the Markov chain X —> (S' 1 , 5 ' 2 ) —> Z. 
Then we show the coincidence of this upper bound with the lower bound of the capacity. 

We note that the Markov chain X —> (Si, S2) ^ Z and the Markov chain X ^ Si ^ S2 from GC4, imply the 
weaker Markov chain X —>■ Si —>■ Z. And since Si and Z are Gaussian, as we prove in lemma 4 in the Appendix 
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D, the recent Markov chain implies that 


Pxz = PXSiPSiZ- 


(38) 


Lemma 1. A General Lower Bound for the Capacity of the Channel: The capacity of the Gaussian channel 
dehned with properties GC.1-GC.4 has the lower bound; 

1 2 


^ log I 1 


[ax (l - PxSi) - {pxSiPSiZ - Pxz)] (l - PliSa) 
((1 - Pxsi) - {pxs^ps^z - pxzf (1 - 


(39) 


where d'p is dehned in (37 1 . 


Proof: Appendix A contains the proof. 


Lemma 2. Upper Bound for the Capacity of the Channel: The capacity of the Gaussian channel 
properties GC.1-GC.4, when the channel input X, the side information (S' 1 , 5 ' 2 ) and the channel noise 
Markov chain X {Si, S 2 ) —t Z, has the upper bound C in (361. 

Proof: Appendix B contains the proof. 


dehned by 
Z form the 


For completing the proof of the theorem, it is enough to compute the lower bound of the channel (391, when we 


have the Markov chain X { 81 , 82 ) Z. Applying the equation (38 1 to equation (39i, shows the coincidence 
of the upper and the lower bounds of the capacity of the channel in this case and the proof is completed. □ 
Remark 1: It can be shown that for variables Si, and 82 and Z with properties GC.l and GC.4; 


I{SiS 2 -,Z) = -log 






and so the channel capacity (36i can be written as: 


C = i log (^1 + ^ (1 - pIsJ exp {21 { 8182 -, Z))'^ 


(40) 


(41) 


that is an increasing function of I {S 1 S 2 ; Z). 

Remark 2: The transmission rate C in ( [36| can be reached by encoding and decoding schema represented in Q 
modihed for continuous Gaussian distributions. 


IV. Interpretations and Numerical Results of the Capacity Theorem 

In previous section, the capacity of the Gaussian channel with two-sided information correlated to the channel 
input and noise, has been obtained. The capacity theorem is general except that the Markov chain X —> (^i, 82 ) —t Z 
must be satished. In this section we present some corollaries of the capacity theorem. First, we examine the effect 
of the correlation between the side information and the channel input on the channel capacity. Second, we try to 
exemplify our new description of the concept of “cognition” of a communicating object (here, transmitter and or 
receiver) on some features of channel (here, channel noise), by our capacity theorem. 
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A. The Effect of the Correlation between the Side Information and the Channel Input on the Capacity: 


If we assume that the channel noise Z is independent of {X, Si, S 2 ), from (36 1 , the capacity of the channel is; 


1 / p 

Cl = - log 1 + - (1 - 


N 


Pxsi, 


(42) 


Corollary 1: From (27 1 , Ci is reduced to the Costa capacity by maximizing it with pxSi = 0- 
Corollary 2: It is seen that in the case the side information S 2 is independent of the channel noise Z, the capacity 
of the channel is equal to the capacity when there is no interference 82 - In other words, in this case, the receiver 
can subtract the known Sf from the received F" without losing any worthy information. 

Corollary 3: The correlation between X and Si decreases the capacity of the channel. It can be explained as 
follows: by looking at F = X + Si + Z in our dirty paper like coding, mitigating the input-dependent interference 


effect, also mitigates the input power impact on the channel capacity as this fact is seen in (42 1 as (1 — PxSi)- 
As an extreme and interesting case, when Si — X (then pxSi = 1)^ according to the usual Gaussian coding, the 
capacity seems to be ^ log (l + ^), which is the capacity when 2X is transmitted and Y = 2X -|- Z is received. 
But as our theorem shows, the capacity paradoxically is zero. Because the receiver based on his information ought 
to decode according to the dirty paper like coding. In DP like coding, with given known sequence 5 '"q, we must 
find an auxiliary sequence t/" like jointly typical with S'"q 0. Jointly typicality of {Uq,Siq) is equivalent 
to; 


(jrn QTi cn 

(Uq a q) , 


< (5 


8 small 


(43) 


where denotes the transpose operation and a* is computed according to ( 681 . If AT = Si, there exists no such 
Uff. since XJf = = S'Iq, we have 


{U--a*Sl,)'^ 8% =||5r,ol 


(44) 


where ||S'"o|| is the norm of the given known sequence ^'"q and therefore (43l can not be true. In other words, in 
this case, encoding error occurs. 

Fig. shows the variation of the capacity Ci with respect to pxSi when ^ = 1. It is seen that when the 
correlation between the channel input and the side information known at the transmitter increases, the channel 
capacity decreases. The maximum capacity is gained when pxSi = 0, that is Costa’s capacity. Fig. shows the 
capacity Ci with respect to SNR for five values of pxSi- 


B. Exemplification of the Re-cognition of Transmitter and Receiver on the channel Noise: 

1) Re-cognition: “Cognition” is an indispensable concept in communication. The assumption that an intelligent 
communicating object (transmitter, receiver, relay and so on) has got some side knowledge about some features of 
the communication channel, is a true and acceptable assumption. This exceeded information owned, for example, by 
the transmitter is described by ’’side information” known at the transmitter. In usual description, the side information 
is considered as the subject of cognition itself, for example, the interference of another transmitter in a cognitive 


radio channel |23|. On the other hand, the assumption that the knowledge may be incomplete or imperfect, is 
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Fig. 6. 


Capacity of the channel with respect to px Si when S 2 = 0 and 


P 

N 



Fig. 7. Capacity of the channel with respect to the SNR when S 2 = 0. 


necessary in most communication scenarios. Describing this incomplete cognition and corresponding information- 
theoretic concept, i.e., partial side information are found in the literature; for example in the imperfect known 
interference is partitioned to one perfect known and one unknown parts; and in pO) partial side information is 
considered as a disturbed version of the subject variable by noise. 

We try here to present an alternative description for the concept of “cognition” in communication by the concept 
of side information. The essential property of this description is the separation of the subject of knowledge K (for 
example interference, channel noise, fading coefficients and so on) from the side information S that carries the 
knowledge for the intelligent agent (for example transmitter, receiver, relay and so on) and known by it. This point 
of view is compatible with what happens in reality: we always acquire our knowledge on something indirectly by 
knowing other things. What make it possible to extract the knowledge about K from S is dependency between S 
and K. Each method of extraction of knowledge about K from S (estimation and so on), originally relies on this 







16 


dependency. If S is independent from K then S is non-informative about K. And it is expected that increasing the 
dependency between S and K, increases the possible knowledge of S about K. 

Avoiding confusion between this new with the usual descriptions of the cognition, we use the word “re-cognition” 
for it and dehne it as follows: 

A communicating agent (transmitter, receiver, relay and so on) has “re-cognition" on some variable K if the 
side information S known by it, has probabilistic dependency on K. 

2) Exemplification: In the Gaussian channel dehned and analyzed in the previous section, the side information 
(Si^Sf) is dependent to the channel noise and therefore the transmitter and the receiver have got re-cognition on 
the channel noise by and S 2 respectively. The capacity is proved with Markovity constraint X [Si, 82 )^ 2 . 
Considering the new description of re-cognition, this Markov chain simply means that the transmitter acquires all 
its re-cognition on the channel noise via the side information ( 81 , 82 ), which is meaningful and acceptable. 

Corollary 4: If 82 = 0, the transmitter have re-cognition on the channel noise Z obtained by 81 correlated to 
the noise. If there is no constraint on correlation between X and ^i, pxSi = 0 maximizes the transmission rate, 
as mentioned in (27i. Therefore, from (36i and ([4T]), the capacity in this case is: 


C=llog 1 


P 


1 


^ (1 PSiz) 


p 


= - log ( 1 -f ( — ) exp (21 ( 81 ; Z)) 


(45) 


It is seen that more correlation between and Z results in more re-cognition of the transmitter on the channel 
noise and more capacity. The capacity reaches to inhnite when psiZ = ±1 and therefore the transmitter has perfect 
re-cognition about the channel noise. 

Fig. 0 illustrates the capacity of the channel with respect to ps^z, the correlation coefficient between the side 
information and the channel noise when ^ = 1. It is seen that when the correlation increases (that it means 
that 81 carries more re-cognition on the channel noise to the transmitter), the capacity increases. Fig. shows the 


capacity of the channel with respect to 8 NR for hve values of ps^z- Fig- 10 illustrates the capacity of the channel 
with respects to mutual information I ( 81 , Z) for hve values of 8 NR. 

Corollary 5: If 81 = 0, the receiver have re-cognition on the channel noise Z obtained by 82 correlated to the 
noise. The capacity in this case is: 

1 / P 1 \ 

(46) 


C=-log 1 


^ (1 P%z) J 

It is seen that more correlation between 82 and Z results in more re-cognition of the receiver on the channel noise 
and more capacity. Perfect re-cognition takes place with ps^z = ±1 and results in inhnite capacity. 

Corollary 6: If psj^S 2 = If there is no constraint on correlation between X and 81, pxSi = 0 maximizes the 
transmission rate, as mentioned in Therefore the capacity of the channel is: 

(f “ P'siZ ~ PS2z) ) 

It is seen that when + P^^z ~ 1’ '■f*® capacity reaches to inhnite, even if neither the transmitter nor the receiver 

has perfect knowledge about the channel noise. In this case the transmitter and the receiver have their shares in 
re-cognition on the channel noise which leads to totally mitigating the channel noise. 








17 



Fig. 8. Capacity of the channel with respect to pSiZ when ^ = 1. 



Fig. 9. Capacity of the channel with respect to SNR for five values of pSiZ- 


V. Conclusion 

By fully detailed investigating the Gaussian channel in presence of two-sided input and noise dependent state 
information, we obtained a general achievable rate for the channel and established the capacity theorem. This 
capacity theorem, first demonstrate the impact of the transmitter and receiver cognition, with a new introduced 
interpretation on the capacity and second show the effect of the correlation between the channel input and side 
information available at the transmitter and at the receiver on the channel capacity. Whereas, as expected, the 
cognition of the transmitter and receiver increases the capacity, the correlation between the channel input and the 
side information known at the transmitter decreases it. 
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Fig. 10. Capacity of the channel with respect to I (Si; Z) for five values of SNR. 


VI. Appendix 

Appendix A. 

The proof of Lemma 1: Using the extension of Cover-Chiang capacity theorem given in ([T]l for random 

variables with continuous alphabets, the capacity of our channel can be written as: 


C= max [I{U-,Y,S 2 )-I{U;Si)] (48) 

p(«,a:|si) 

where the maximum is over all distributions p {y, x, u, si, S 2 ) in having properties GC.1-GC.4. Since C 

^Pxsi we have: 


C> max [I{U-Y,S 2 )-I{U-,Si)] (49) 

p*('U,a:|si) 

max mU-.Y.Sf}- I{U-,Si)] (50) 

p*('u|a:,si)p*(a|si) 

=max[/(C/;y,52)-/(C/;5i)] (51) 

OL 


where the expression I [U\ Y, S 2 ) — I [U; ^i) in (51 1 is calculated for the distributions in having properties 

GC.1-GC.6. Thus, dehning R{a) = I {U]Y, S 2 ) — I (U; ^i), we have: 


C > maxi? (a) = R (a*), 

a 

therefore R (ct*) is a lower bound for the channel capacity. To compute R (a*), we write: 


(52) 


and 


I ([/; y, S2) = HiU) + H {Y, S2) - H ([/, V, ^2) 


(53) 


I ([/; 5i) = H{U)YH {S^) - H {U, Si). 


(54) 


For H (Y, S2) we have: 


H lY, S 2 ) = ^ log ((2^e)" det (cov (Y, S 2 ))) 


(55) 
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where 

cov{Y,S2) = [e*j]2x2 

and 

eii= P + Qi + Q2 + iV + ‘ 1 A\ + ‘IA2 + 2 _B + 2 _Lo + ‘ 1 L\ + 2 _L 2 , 
612= 621 = A2 + -B + Q2 + L2 and 622 = Q2 
where the P, Qi’s, N, Ai’s, Li’s and B are defined in previous section. Therefore 


(56) 


(57) 


det (cov (Y, S 2 )) = dQitv + dpN + dpg^ + 2dLoLi — ^.dpp^ — ^dq^Po^ 

where the terms are defined in ( |T 5 | ). 

For H {U, Y, S 2 ) we have; 

H (C 7 , y, S 2 ) = i log (( 2 ^e)' det {cov {U, Y, B2))) 

where 

cov{U,Y,S2) = 

and 

cii= P + ot^Qi + 2q!Ai, 

ei2= 621 = P + (q; + 1 ) Ai + oiQi + cxB + aLi + A2 + Lq, 
ei 3 = 631 = Oid3 + A 2 , 

622— P + Qi + Q2 + IV + 2Ai + 2^2 + 2B + 2Lf) + 277 i + 2L2, 

623= 632 = A2 Y B + Q2 + L2 and 633 = Q2 
after some manipulations we have: 

det(cou(C/, Y, 5'2)) = {a — 1)^ d^ + a^dp + 2a{a— 1) + 2adAi + 2 (a — 1) dp^ + dg^ 

For H {Si) and H {U, Si) we have: 

H{Si) = ^\og{{27Te)Qi). 


where 


cov ({ 7 ,5*1) = 

and the determinant of this matrix is: 


H {U, Si) = i log (( 2 ^e)" det {cov {U, Bi))) 
cd^Qi Y P Y 2q;Ai ciQi Y Ai 


ttQi Y Ai 


Qi 


Substituting ( | 55 ] l, 


det {cov (C/, Si)) = dq^N- 

and (|64ll in (|5^ and (|54)l, we obtain; 


R{a) = - log 


dq2N 


dq^N Y dpN Y dpq^ Y 2 dpgp^ — 2 dpp^ — 2 dq^pg 




{a — 1)^ dN Y a^dp + 2q; (a — 1) dpg Y 2 adAi + 2 (a — 1) dp^ Y dq^ 


(58) 

(59) 

(60) 

( 61 ) 

(62) 

(63) 

(64) 

(65) 

( 66 ) 

(67) 
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The optimum value of a corresponding to maximum of R{a) is easily obtained as: 

{(In + dio) ~ + ^Li) 


a = 


dfi + dp + 2 dLo 


( 68 ) 


Substituting a* from ( 681 into (671 and using the equations (351, (291-(32 1 and (26 1 we finally conclude that R (a*) 


equals Rq in (39i. Therefore Rq in (391 is a lower bound for the capacity of the channel defined by properties 

□ 


GC.1-GC.4 in III-A (details of computations are omitted for the brevity). 


A. Appendix B. 

The proof of Lemma 2: For all distributions p{y,x,u,si,S 2 ) in defined by properties GC.1-GC.4, we 


have: 


(69) 

(70) 

(71) 

(72) 

(73) 


/ (C7; r, ^ 2 ) - I (U; Si)=-H {U \ Y, S 2 ) + H {U \ Si) 

<-H(U\Y,Si,S 2 ) + H{U\Si) 

=-H {U \ Y, Si, S 2 ) + H{U\ 81 , 82 ) 

=IiU-,Y\ 81 , 82 ) 

<I{X-,Y I ^ 1 , 52 ) 

where ( |70| ) follows from the fact that conditioning reduces entropy and ( |7T] i follows from Markov chain 82 
Si UX and ( |7^ from Markov chain U XS 1 S 2 Y which are satisfied for any distribution in the form of 
(j^, including the distributions in the set T’pxsi- *731 we can write: 

(74) 

(75) 


C= max [I{U-,Y,S 2 )-I{U-,Si)\ 

p(«,£c|si) 


< max [/ {X-,Y \ S'!, 52 )]. 

p(a;|si) 


From (75 I it is seen that the capacity of the channel cannot be greater than the capacity when both S'! and 82 are 


available at both the transmitter and the receiver, which is physically predictable. To compute (75 1 we write: 
I{X-,Y\ Si,S 2 )=H{Y I Si,S 2 )-H{Y I X,Si,S 2 ) 

=H{X + Si + S 2 + Z\ 81 , 82 ) -H{X + Si+S 2 + Z\X, 81 , 82 ) 

=H{X + Z\ 81 , 82 )-HiZ\X, 81 , 82 ) 

=H{X + Z\ 81 , 82 )-H{Z\ 81 , 82 ) 

=HiiX + Z),Si,S 2 )-H{Si,S 2 ,Z), 


(76) 

(77) 

(78) 

(79) 

(80) 


where ([7^ follows from the Markov chain X —> (S'i,S' 2 ) —)• Z. Hence, the maximum value in (75i occurs 


when H ((AT + Z), Si, 82 ) is maximum. Since Si, 82 and Z are Gaussian, the maximum in (75 1 is achieved 
when {X, S'!, iS' 2 ) are jointly Gaussian and X has its maximum power P, in other words, I {X-,Y \ 81 , 82 ) must be 
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computed for distribution p* {y, x, Si, S 2 ) having the properties GC.1-GC.6. Let I* {X;Y \ Si, S 2 ) be the maximum 
value in CD- We have: 

C<r{X;Y\Si,S 2 ) (81) 


To compute I* {X-,Y | Si, S 2 ), vve first compute H ((X + Z) ,Si, S 2 ) for distribution p* {y, x, Si, S 2 ) defined by 
properties GC.1-GC.6: 

1 


H 


((X + Z), Si,S 2 ) = - log det [cov {{X + Z), Si, S 2 ))) 


where 


cov ((X + Z), Si,S 2 )=E 


and the determinant: 


and the other term in (|80ll: 


{X + zf {X + Z) Si (X + Z) S2 
(X + Z)Si Sf S1S2 

{X + Z)S2 S1S2 si 

P N 2 Lq Ai + Li A 2 + L 2 

Ai + Li Qi B 

A2 + L2 B Q2 


det (cow {{X + Z), Si,S 2 )) = + 2dLg + dp, 

{Si,S2,Z) = ^log(^{27Tefdp) 


Hi 


where the terms are defined in (|T5|). 


Substituting (85i in (82i, and from (861 we have: 


r (X; r I 51,^2) = ;. log 1 + 


dN + 2.dLo 


(82) 


(83) 


(84) 


(85) 


( 86 ) 


(87) 


Rewriting (87i in terms of ax, <JSi, <^ 82 , ^z, PSiZ,ps 2 Z, and PS 1 S 2 using (29i-(32i and (35 1 and taking into 


account two Makovity results (26 1 and (38 1 , we finally conclude that (details of manipulations are omitted for the 
brevity): 

r (X; Y\Si,S 2 ) = hog(l + ^^ ^^~ ] . ( 88 ) 


N 




Hence, C in (36 1 is an upper bound for the capacity of the channel when we have the Markov chain X —> 

{Si,S 2 )^Z. □ 


Appendix C. 

Lemma 3: Two continuous random variables X and S with probability density functions fx (,x) and fs (s) 
can be correlated to each other with a specific correlation coefficient pxs- 
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Proof: Suppose Fx{x) and Fs{s) are the distribution functions of fx{x) and /s(s) respectively. If X 

and S are jointly distributed with a joint density function fx,s {x, s) given below, we prove that the correlation 
coefficient is pxs- 

fx,six,s) = fx{x)fs{s) [l + p{2Fxix) - 1) (2f^s(s) - 1)] (89) 


in which 


with 


P = - Pxs- 

axas 


and 


/ + 00 

xfxix) (2Fx(x) - 1) dx 

-OO 

/ + 00 

sfs{s) {2Fs{s) - 1) ds. 

-OO 


(90) 


(91) 


(92) 


First we note that (89i is a joint density function with marginal densities fx{x) and /s(s) |24 p.l76]. Then we 


need to prove that E {XS”} = ax'XsPxs + E {X} E {S'}. From (89 1 we have: 

^+oo /*+oo 


/ -\-oo p-\-oo 

/ xsfx,s{x,s)dxds 

-OO j —OO 

/ +00 /*+oo 

/ xsfx{x)fs{s) [1 + p{2Exix) - 1) (2Fs(s) - l)]dxds 

-OO J —OO 


=E {X} E {S} F paxas 


(93) 

(94) 

(95) 


To complete the proof, we need to show that ax and as in ( |9T] ) and (92i exist and have nonzero values. We can 
show that: 


/ + 00 

Fx{x) (1 - Fx{x)) dx = ax + 

-OO 


- +00 


xFx{x){l - Fx{x)) 


(96) 


The second expression in the right hand side of (96i is equal to zero because Ex{Foo) (1 — Ex{Foo)) is exactly 


equal to zero by definition. The integrand at the left hand side of (96 1 is a positive and continuous function of x 
and therefore the integral exists and has nonzero positive value. So ax exists and is nonzero. The same argument 
is valid for as- □ 


Appendix D. 

Lemma 4: Consider three zero mean random variables (X, Si, S 2 ) with covariance matrix K as: 

X 2 XSi XS2 
XSi Sf S1S2 
XS2 S1S2 si 

CTx crxCTSiPxSi crxcrs^Pxs^ 

<XxCrSiPxSi cr|^ crSiCrSaPSiSa 

(^xcrs2PXS2 



( 97 ) 
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Suppose (S' 1 , 5 ' 2 ) are jointly Gaussian random variables. Then, if {X, 81 , 82 ) form Markov chain 82 ^ 81 ^ X, 
(even ifX is not Gaussian) we have: 

PXS 2 = PXS 1 PS 1 S 2 (98) 


or equivalently: 


Proof: we can write: 


E{ 8 f}E{X 82 } = E{X 8 i}E{ 8 i 82 } 


PXS 2 


E{X82} _ E{E{X82 I ^1}} 
<yx(^S2 ^X<yS2 

E {E {X \ 81} E {82 \ 8,}} 

■^^^E{8iE{X I 5 i}} 

^^^E{X8i} 

O'xcrsi 

PXS1PS1S2 


(99) 

( 100 ) 

( 101 ) 

( 102 ) 

(103) 

(104) 


where (101 1 follows from the Markov chain 82 ^ 81 ^ X and (102i follows from Gaussianness of (^i, S' 2 ) and 


the fact that E {82 \ -Si} = and (103 1 follows from the general rule that for random variables A and 

B we have E {gi [A) 52 (B)} = E {pi (A) E {52 (B) \ A}} @ p.234]. □ 
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